Discrepancy Search with Reactive Policies for Planning
نویسنده
چکیده
We consider a novel use of mostly-correct reactive policies. In classical planning, reactive policy learning approaches could find good policies from solved trajectories of small problems and such policies have been successfully applied to larger problems of the target domains. Often, due to the inductive nature, the learned reactive policies are mostly correct but commit errors on some portion of the states. Discrepancy search has been developed to explore the structure of the heuristic function when it is mostly-correct. In this paper, to improve the performance of machine learned reactive policies, we propose to use such policies in discrepancy search. In our experiments on benchmark planning domains, our proposed approach is effective in improving the performance of the machine learned reactive policies. The proposed approach outperformed the policy rollout with the learned policies as well as the machine learned policies themselves. As an extension, we consider using reactive policies in heuristic search. During a node expansion in a heuristic search, we added to the search queue all the states that occur along the trajectory of the given policy from the node. Experiments show that this approach greatly improves the performance of heuristic search on benchmark planning domains.
منابع مشابه
Application of Tabu Search to Optimal Placement of Distributed Generation and Reactive Power Sources
Introducing distributed generation into a power system can lead to numerous benefits including technical, economic, environmental, etc. To attain these benefits, distributed generators with proper rating should be installed at suitable locations. Given the similar effects of distributed generators and capacitor banks on operation indices of a distribution system, simultaneous assignment of best...
متن کاملApplication of Tabu Search to Optimal Placement of Distributed Generation and Reactive Power Sources
Introducing distributed generation into a power system can lead to numerous benefits including technical, economic, environmental, etc. To attain these benefits, distributed generators with proper rating should be installed at suitable locations. Given the similar effects of distributed generators and capacitor banks on operation indices of a distribution system, simultaneous assignment of best...
متن کاملLearning Generalized Reactive Policies using Deep Neural Networks
We consider the problem of learning for planning, where knowledge acquired while planning is reused to plan faster in new problem instances. For robotic tasks, among others, plan execution can be captured as a sequence of visual images. For such domains, we propose to use deep neural networks in learning for planning, based on learning a reactive policy that imitates execution traces produced b...
متن کاملLearning to Plan Probabilistically
This paper discusses the learning of probabilistic planning without a priori domain-specific knowledge. Different from existing reinforcement learning algorithms that generate only reactive policies and existing probabilistic planning algorithms that requires a substantial amount of a priori knowledge in order to plan, we devise a two-stage bottom-up learning-to-plan process, in which first rei...
متن کاملUsing a new modified harmony search algorithm to solve multi-objective reactive power dispatch in deterministic and stochastic models
The optimal reactive power dispatch (ORPD) is a very important problem aspect of power system planning and is a highly nonlinear, non-convex optimization problem because consist of both continuous and discrete control variables. Since the power system has inherent uncertainty, hereby, this paper presents both of the deterministic and stochastic models for ORPD problem in multi objective and sin...
متن کامل